9 research outputs found
Combining Sentiment Lexica with a Multi-View Variational Autoencoder
When assigning quantitative labels to a dataset, different methodologies may
rely on different scales. In particular, when assigning polarities to words in
a sentiment lexicon, annotators may use binary, categorical, or continuous
labels. Naturally, it is of interest to unify these labels from disparate
scales to both achieve maximal coverage over words and to create a single, more
robust sentiment lexicon while retaining scale coherence. We introduce a
generative model of sentiment lexica to combine disparate scales into a common
latent representation. We realize this model with a novel multi-view
variational autoencoder (VAE), called SentiVAE. We evaluate our approach via a
downstream text classification task involving nine English-Language sentiment
analysis datasets; our representation outperforms six individual sentiment
lexica, as well as a straightforward combination thereof.Comment: To appear in NAACL-HLT 201
Recommended from our members
A Structured Variational Autoencoder for Contextual Morphological Inflection
The SIGMORPHON 2019 Shared Task: Morphological Analysis in Context and Cross-Lingual Transfer for Inflection
The SIGMORPHON 2019 shared task on cross-lingual transfer and contextual
analysis in morphology examined transfer learning of inflection between 100
language pairs, as well as contextual lemmatization and morphosyntactic
description in 66 languages. The first task evolves past years' inflection
tasks by examining transfer of morphological inflection knowledge from a
high-resource language to a low-resource language. This year also presents a
new second challenge on lemmatization and morphological feature analysis in
context. All submissions featured a neural component and built on either this
year's strong baselines or highly ranked systems from previous years' shared
tasks. Every participating team improved in accuracy over the baselines for the
inflection task (though not Levenshtein distance), and every team in the
contextual analysis task improved on both state-of-the-art neural and
non-neural baselines.Comment: Presented at SIGMORPHON 201
On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs
We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. For all six languages, we find that there is a statistically significant relationship. We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. We defer deeper investigation of these relationships for future work.ISSN:2307-387
On the distribution of deep clausal embeddings: a large cross-linguistic study
Comunicació presentada a: 57th Annual Meeting of the Association for Computational Linguistics celebrat del 28 de juliol al 2 d'agost de 2019 a Florencia, Itàlia.Embedding a clause inside another (“the girl [who likes cars [that run fast]] has arrived”) is a fundamental resource that has been argued to be a key driver of linguistic expressiveness. As such, it plays a central role in fundamental debates on what makes human language unique, and how they might have evolved. Empirical evidence on the prevalence and the limits of embeddings has however been based on either laboratory setups or corpus data of relatively limited size. We introduce here a collection of large, dependency-parsed written corpora in 17 languages, that allow us, for the first time, to capture clausal embedding through dependency graphs and assess their distribution. Our results indicate that there is no evidence for hard constraints on embedding depth: the tail of depth distributions is heavy. Moreover, although deeply embedded clauses tend to be shorter, suggesting processing load issues, complex sentences with many embeddings do not display a bias towards less deep embeddings. Taken together, the results suggest that deep embeddings are not disfavoured in written language. More generally, our study illustrates how resources and methods from latest-generation big-data NLP can provide new perspectives on fundamental questions in theoretical linguistics